(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 

(43) International Publication Date 
25 July 2002 (25.07.2002) 




(10) International Publication Number 

PCT WO 02/058355 A2 



(51) International Patent Classification 7 : 

(21) International Application Number: PCT/GB02/00128 

(22) International Filing Date: 15 January 2002 (15.01.2002) 
(25) Filing Language: English 



H04L 29/00 (72) Inventors: CRANFORD, Hayden, Clavie; 6900 Bran - 
ton Drive, Apex, NC 27502 (US). NORMAN, Vernon, 
Roberts; 821 Summerwinds Drive, Cary, NC 2751 1 (US). 
SCHMATZ, Martin, Leo; Teufenerstrassc 158, CH-9012 
St Gallen (CH). 

(74) Agent: BURT, Roger, James; Intellectual Property Law, 
Hursley Park, Winchester, Hampshire wS02l 2JN (GB). 



(26) Publication Language: 



English 



(30) Priority Data: 

60/262,358 
09/996,091 



16 January 2001 (16.01.2001) US 
28 November 2001 (28.11.2001) US 



(71) Applicant: INTERNATIONAL BUSINESS MA- 
CHINES CORPORATION [US/US]; New Orchard 
Road, Armonk, New York, NY 10504 (US). 

(71) Applicant (for MG only): IBM UNITED KINGDOM 

LIMITED [GB/GB]; P.O. Box 41, North Harbour, 
Portsmouth, Hampshire P06 3AU (GB). 



(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, H, GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA, MI), MG, MK, MN, MW, 
MX, MZ, NO, NZ, OM, PH, PL, PT, RO, RU, SD, SE, SG, 
SI, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, UZ, VN, 
YU, ZA, ZM, ZW. 

(84) Designated States (regional): AR1PO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, 
GB, GR, IE, IT, LU, MC, NL, PT, SE, TR), OAPI patent 

[Continued on next page] 



(54) Title: SERIAL LINK ARCHITECTURE 



< 

IT) 

m 

IT) 
© 



12a 



14a 



16 - 



18 — 



16 



18 ■ 



10 



'12b 



.20 



.20 



-Connector- 



Cable v^ 



20. 



-Connector- 



14b 



.18 



16 



18 



16 



o 



(57) Abstract: A global architecture for a serial link connection between two cards which must transmit data across wired media is 
provided. The architecture comprises a transmitter portion and a receiver portion. The transmitter portion includes a structure and 
circuitry to take digital bits from a first bit register, such as for example, an eight-bit register or a ten-bit register, and convert these 
bits into serial analog transmission to the receiver portion. The receiver portion includes a structure and circuitry to sample the analog 
transmission of the original digital bits and reconvert the analog serial signal of the digital bits corresponding to the original digital 
bits and store them in a second bit register comparable to the data stored in the original register from which they were selected. 
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SERIAL LINK ARCHITECTURE 

FIELD OF THE INVENTION 

5 This invention relates generally to the transfer of data in serial 

form from a register on one ASIC (application specific integrated circuit) 
chip on a card to a register on another ASIC chip on a card and, more 
particularly, to the serial transfer of such data wherein the data is 
converted from parallel digital form to serial analog form for transfer from 
10 one ASIC to the second ASIC and is then reconverted to parallel digital form 
in the second ASIC, after it has been transferred, in serial analog form. 

BACKGROUND OF THE INVENTION 

15 Serial data must be transmitted across wired media. The transmit and 

receive sections include chips wired to one another and card-to-card 
interconnects. The transmission media can be a combination of printed 
circuit boards, connectors, back plane wiring, fiber or cable. The 
interconnect can include its own power, data and clocking sources or may 

20 derive these functions from a host module. Such data has typically been 
transmitted through a parallel data bus, such as ISA, PCI, PCI-X and the 
like. One drawback of such parallel links is the moderate rate of data 
transmission due to improved microprocessor performance, resulting in data 
transfer bandwidths that typically outpace I/O transfer rates. Also, the 

25 ASIC I/O count is high. In addition, the system integration I/O count using 
a parallel data bus is high. Finally, the overall system cost associated 
with the use of the parallel data bus tends to be high. 

Related art shows attempts to overcome these difficulties and 
3 0 drawbacks by utilizing serial communication systems" involving a variety of 
schemes. For example, some have used a carrierless amplitude /phase (CAP) 
modulation scheme. Others have used linear compress ion/ decompress ion and 
digital signal processing techniques for frequency modulation. Still others 
use a linear (analog) phase rotator to recover only the carrier of an 
35 incoming signal. Some transmit using a pass band which limits the bandwidth 
of the frequencies being passed, rather than a baseband channel wherein the 
signals are not shared and the frequencies are not restricted. 
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SUMMARY OF THE INVENTION 

According to a first aspect of the present invention there is provided 
method, of transferring stored digital parallel data of multiple bits of 



WO 02/058355 



2 



PCT/GB02/00128 



data stored in a first data register from a transmitter to a receiver over a 
hard wired conductor, comprising the steps of: 

synchronously converting said stored digital data to a serial analog 
data signal in said transmitter; 
5 transmitting said serial analog signal asynchronously over said hard 

wired conductor to said receiver; and 

restoring said asynchronous serial analog signal to synchronous 
digital parallel data in said receiver corresponding to the data stored in 
said first data register in said transmitter, including detecting both edges 
10 of the data in said asynchronous serial analog signal for conversion to 
parallel data bits. 

According to a second aspect of the present invention there is 
provided a structure, for transferring stored digital parallel data of 
15 multiple bits of data stored in a first data register, comprising: 

a transmitter and a receiver connected by a hard wired conductor; 
circuitry to synchronously convert said stored digital data to a 
serial analog data signal in said transmitter; 

circuitry to transmit said serial analog signal asynchronously over 
20 said hard wired conductor to said receiver; and 

circuitry to restore said asynchronous serial analog signal to 
synchronous digital parallel data in said receiver corresponding to the data 
stored in said first data register in said transmitter, including detecting 
both edges of the. data in said asynchronous serial analog signal for 
25 conversion to parallel data bits. 

The present invention comprises a global architecture for a serial 
..link connection between two cards which must transmit data across wired 
media. The architecture comprises a transmitter and a receiver. The 

3 0 transmitter includes circuitry and a structure to take digital bits from a 
bit register, such as for example, an eight -bit register or a ten-bit 
register, and convert these bits into serial analog transmission to the 
receiver. The receiver includes a structure and circuitry to sample edges 
of the data on analog transmission of the original digital bits and 

35 reconvert the analog serial signal of the digital bits to the original 

digital bits and store them in a register comparable to the data stored in 
the original register from which they were selected. 



40 



DESCRIPTION OF THE DRAWINGS 



Figure 1 is a high level diagram showing a wired interconnection 
between a transmitter portion and a receiver portion of a serial link; 
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Figure 2 is a block diagram showing the operation of the circuitry of 
the transmitter portion of the architecture; 

Figure 3 is a block diagram showing the operation of the circuitry of 
the receiver portion of the architecture; 

Figure 4 is an illustration of the control circuit for a phased lock 

loop; 

Figure 5 is a block diagram of a transmitter architecture; 
Figure 6 is a block diagram of a receiver architecture; 
Figure 7 shows an averaging pattern for a phase rotator control; 
Figure 8 shows another embodiment of an averaging pattern for a phase 
rotator control; 

Figure 9 is a block diagram of a transmitter architecture; 







Figure 


10 


is 


a 


schematic diagram of a loop filter; 






Figure 


11 


is 


a 


schematic of a transmit VCO; 
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Figure 


12 


is 


a 


schematic of a transmit VCO delay cell; 






Figure 


13 


is 


a 


block diagram of a receiver architecture; 






Figure 


14 


is 


a 


schematic view of a receiver circuit; 






Figure 


15 


is 


a 


schematic view of a differential amplifier; 






Figure 


16 


is 


a 


schematic diagram of a receive sampling latch; 
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Figure 


17 


is 


a 


schematic of a receive VCO; 






Figure 


18 


is 


a 


schematic diagram of a latch buffer; 






Figure 


19 


is 


a 


schematic diagram of an inverter buffer; 






Figure 


20 


is 


block diagram of another embodiment of a dual loop PLL; 






Figure 


21 


is 


a 


block diagram of the coarse loop of Figure 20; 
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phase 


Figure 
buffer; 
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is 


a 


block diagram of the topography of a phase rotator and 






Figure 


23 


is 


a 


schematic diagram of a phase rotator cbias; 






Figure 


24 


is 


a 


schematic diagram of a phase rotator currents buffer; 






Figure 


25 


is 


a 


block diagram of a phase rotator currents buffer six 
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pack; 
















Figure 


26 


is 


a 


schematic diagram of a phase rotator currents buffer 




array 
















Figure 


27 


is 


a 


block diagram of a phase rotator core circuit six pack, 






Figure 28 


is 


a 


schematic diagram of a phase rotator core circuit; 
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Figure 29 


is 


a 


schematic diagram of a phase rotator core buffer 




circuit; 














Figure 


30 


is 


a 


schematic diagram of a phase rotator core buffer 




post-1 


buffer circuit; 








Figure 


31 


is 
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block diagram of another embodiment featuring a basic 
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FIR filter approach with an eight stage/phase ring oscillator; 






Figure 


32 


is 


a 


graphical representation of the stepwise change of 



output phase by a phase rotator; 
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Figure 33 is a simplified schematic for a six phase version of a phase 
rotator; and 

Figure 34 provides a detail view of one of a circuit block of the phase 
rotator of Figure 33 . 

5 

DESCRIPTION WITH REFERENCE TO THE DRAWINGS 

Referring now to the drawings and, for the present, to Figure 1, a 
high level diagram of interconnection of ASICs showing a transmitter on one 

10 side of the connection and a receiver on the other side of the connection 
for several different transmitter and receivers for passing information is 
shown. The embodiment can be implemented in any one of several different 
configurations, such as a combination of a printed circuit boards, 
connectors, back plane wiring, fiber or cable. As shown, the implementation 

15 will be on a back plane with hard wiring between the transmitting portion 
and the receiving portion. 

As can be seen in Figure 1, a back plane 10 is provided which has 
mounted thereon a pair of printed circuit (PC) cards 12a and 12b. Each 

2 0 circuit card 12a and 12b is provided with, respectively, ASIC chips 14a and 

14b which are to be interconnected. Each ASIC 14a, 14b has at least one 
transmitter 16 and, as illustrated, has two such transmitters, although more 
can be provided. Also, each ASIC 14a, 14b is provided with at least one 
receiver 18/ again, the illustrated embodiment shows two receivers 18, 
25 although, as indicated above with respect to the transmitter 16, more than 
two can be provided. Generally speaking, the transmitter 16 and receiver 18 
are provided in pairs since data generally will have to flow in both 
• directions- and the connection described herein is unidirectional. Each 
transmitter 16 on ASIC 14a or 14b includes one-way hard wired serial buses 

3 0 20 interconnecting the transmitter 16 on one ASIC 14a or 14b to a receiver 

18 on the other ASIC 14a or 14b. Thus, two-way communication is provided by 
having paired transmitters and receivers on each ASIC 14a or 14b. 

Briefly, each transmitter 16 has stored therein parallel digital data 
35 in a register 24 (Fig. 2) . The transmitter 16 converts this stored, 

parallel, digital data in the register 24 in one ASIC, eg. 14a, to serial 
analog form, transmits the data in serial analog form on one of the serial 
buses 20 to the receiver 18 associated therewith on the opposite ASIC, eg. 
14b. The receiver 18 converts the analog asynchronous serial data to 
40 synchronous, parallel, digital data for storage 68 (Figure 3) in a register 
in digital form. 
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Thus, the function of the serial link herein is to take parallel data 
in a register in an efficient manner, transmit it in an asynchronous serial 
analog form and reconvert it to synchronous, parallel, digital data. 

5 Referring now to Figure 2, a block diagram of the circuitry function 

of a transmitter 16 is shown. As can he seen, the transmitter 16 includes a 
bit register 24. Typically, this is either an eight-bit or a ten-bit 
register, although other size registers could be used. The description of 
this particular register 24 will be as a ten-bit register. A two-bit of ten 

10 bit . selector 26 is provided which will select two bits at a time 

sequentially from the register 24. This is done under the synchronous 
control of counter 38. It is to be understood that other than two bits at a 
time can be read from the register 24. However, this number must be a 
number that is evenly divisible into the number of bits in the register 24. 

15 Thus, in the case of a ten-bit register, this could be one, two or five and, 
in the case of an eight-bit register, this could be one, two or four. Two 
bits are preferred. 

Each of the two bits selected by the selector 26 from the register 24 
20 is provided to a bit latch 28a or 28b. This selection and delivery is also 
under the synchronous control of counter 38. The bits are then delivered 
from the latches 28a and 28b to a multiplexor 30, also under the synchronous 
control of counter 38, and then to a one-bit latch 32. From the one-bit 
latch 32, the bits are delivered to a driver equalizer 34,. which will 
25 convert the received digital bits from the latch 32 to a serial analog 
signal output 35 containing the converted digital bits. 

A single phase, ..full., rate, .phase lock loop 3 6 is provided which will 
clock the action of the latch 32 and driver equalizer 34, and also will 

3 0 actuate the counter 38 which, in turn, has inputs to the multiplexor 3 0, the 

latches 28a and 28b, the select 26 and the ten-bit register 24. The phase 
lock loop 36 has as an input thereto a. clock signal, which can be internal 
or external from clock 40, as shown. The counter 38 functions to provide 
synchronous operation of the extraction of the bits from the register 24 by 
35 the selector 26 for delivery to the latches 28a and 28b. Also, the counter 
operates to form a synchronous delivery of the bits from the latches 28a and 
28b to the multiplexor 30 and therefrom to the latch 32. It is at the 
driver equalizer 34 that the digital bits synchronously received are 
converted to a serial analog signal 35. The functioning and more detailed 

4 0 description of the various parts of the transmitter 16, such as the bit 

register 24, selector 26, the latches 28a and 28b, the multiplexor 30, the 
latch 32, the single phase, full rate, phase lock loop 36 and the counter 38 
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are all described hereinafter in more detail with reference to Figures 4 to 
34. The analog output 35 is placed on the serial bus 20. It is transmitted 
in an asynchronous form to the receiver 18 attached to the other end of the 
serial bus 20. As indicated above, the receiver 18 receives the 
5 asynchronous analog signal and converts it to a synchronous digital parallel 
signal corresponding to the digital bits in register 24 for storage in the 
receiver 18 . 



Referring now to Figure 3, a block diagram showing the structure and 

10 circuitry function for converting the asynchronous analog serial signal 35 
to a synchronous digital parallel digital bits for storage in the receiver 
18 for storing bits is shown. The serial analog asynchronous signal 35 is 
received by a signal receiving member 50 which delivers the analog signal to 
sample latches 52. In the sample latches 52, the analog signal is 

15 converted to a digital signal by means of a phase rotator 54 which operates 
under the control of a data detection and edge detection circuit 58 and a 
multi-phase, half rate phase lock loop 60. This technique operates by 
sampling, and preferably multiple sampling, both edges of the data in the 
analog signal and converts the data in the analog signal to parallel data 

20 bits. Preferably, the multiple samples are used to determine the 
approximate center point of each resulting data bit. This is an 
oversampling circuit which will convert the asynchronous analog serial 
signal in selector 62 to a digital output 63 in two-bit increments delivered 
to a shift register 64. A counter 66, which is actuated by the phase 

25 rotator 54, operates on shift register 64 to output the two-bit digital 
signals as ten-bit synchronous signals to ten-bit register 68. The 
operation of this receiver 18 is described hereinafter in detail with 
relation- to Figures 4 to 34. 

3 0 Thus, the ten-bit digital bits stored as parallel data in the ten-bit 

register 24 are converted by the transmitter 16 to an asynchronous analog 
serial signal 35 which is to be transported asynchronously on bus 20, which 
asynchronous analog signal 3 5 is then reconstituted by the receiver 18 to 
the original ten-bit parallel digital bit in register 68. 

35 

As explained hereinafter with respect to Figures 4 to 8, the 
transmitter PLL 36 and the receiver PLL 60 are each provided as a dual loop 
phase locked loop control circuit having a digital coarse loop and an analog 
fine loop. 

40 

The PLL control architecture is intended to provide the coarse PLL 
control loop for a dual-loop pll. Lock is determined by comparing two Grey 
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counters running on reference and PLL clocks. Digital to analog conversion 
(DAC) bits, which set the coarse control voltage for the PLL, are controlled 
by monitoring a signal from the PLL (V_Fine_H) which indicates which half of 
its operating range it is in. Figure 4 illustrates the PLL control circuit. 

More specifically, Figure 4 shows a full data rate PLL 110. This PLL 
is the clock source for the transmitted data and runs at the full data rate 
of e.g. 2 to 3 Gbps. A stable frequency from a reference clock 112 is 
required for determining if the PLL is locked to its correct frequency. The 
10 clock 112 operates at one-fourth of the full data rate. For example, a 625 
Mhz clock rate is used for an operational data rate of 2 . 5 Gbps. A single 
clock phase is buffered, is brought out of the PLL, and is used to drive 
into a phase buffer circuit. 

15 The PLL contains a four- stage voltage controlled ring oscillator 

(VCO) , a 4X frequency divider, phase- frequency detector, charge pump and 
loop filter. These elements form the "fine" control loop. The VCO has both 
a 'fine' analog and a 'coarse' digital control voltage in order to minimize 
the required gain of the fine loop. The VCO is capable of changing the 

20 speed of oscillation by adjusting the local feedback within a delay cell, as 
well as controlling feedback within the VCO for speed enhancement. In 
addition to the fine control loop elements, the PLL 110 contains a reference 
generator, a voltage comparator, PLL control logic, a low-pass filter and a 
digital to analog counter 132. These elements form the coarse control loop. 

The fine control loop is a conventional analog loop and is intended to 
provide a stable, low- noise, low- jitter clock source for the transmitter. 
The details of the fine control loop, are well known in the art and do not by 
themselves comprise any part of the present invention. 



25 



30 



The coarse control loop is a digital representation of a conventional 
analog control loop based on a 'leaky', loop filter capacitor. This type of 
loop relies on leakage from a loop filter cap (capacitor) to drive the 
control voltage in a particular direction regardless of the frequency of the 
35 VCO. A phase detector and charge pump that only increases the charge on the 
filter cap compensates this leakage. The loop is stable when the charge 
being added to the cap balances the charge that is leaking. 

A signal from the reference source 112 is fed into a reference counter 
40 118. A pre scaler 114 divides to one-fourth the frequency from the PLL 110. 
A frequency comparator 120 matches the frequency from the PLL counter 116 
with that of the reference counter 118 to determine if the divided by four 
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PLL 110 output and the reference clock 112 are running at the same frequency 
and are counting at the same rate. The two counters 116, 118 are compared 
over a period of time, e.g. 10 -bit count, as determined by the frequency 
timer 122. Over this 10-bit count, if the comparator determines that the 
5 counted values are maintaining a fixed distance from one another, the 

comparator 120 then confirms that the PLL 110 is locked. The PLL lock 124 
monitors the output of the frequency timer. Every time the frequency timer 
122 reaches its maximum count, the PLL counter 116 and the reference 
counter 118 are reset. Thus, this comparison is performed each time the 

10 frequency timer 122 times out. If, during the interval, the two counters 
116, 118 have not compared to one another or the frequency comparator 120 
has not become true, this assumes that the clocks are locked because the 
counters are not catching up with one another. If, however, the frequency 
timer 122 times out and a frequency comparator 120 has compared the 

15 reference count and the PLL count, it declares that the PLL is unlocked. 
These two counters are reset if the frequency timer 122 declares that the 
PLL is unlocked. 

A V_fine_H signal 126 is introduced from the analog section of the 
20 transmitter and indicates that the fine loop is at the center of its range. 
When the PLL locks and the fine loop is centered, the signal can move up and 
down from the center with some degree of latitude. This then allows for 
perturbations of the system, such as temperature changes. The signals from 
the PLL and the fine loop are asynchronous and go to the sample latch 128. 
25 If the V_fine_H signal is not on and the PLL lock signal is not on, then an 
»up' signal is applied. This causes the decision counter 134 to count up, 
thereby causing the DAC counter 132 to also count up. When both of these 
conditions are met, the system stops.. counting up. 

3 0 The DAC counter is a binary search counter with 64 possible steps 

counting up from 000000 to 111111. The counter steps through all of the 
different settings until it finds a setting where the PLL will lock. 

The PLL control logic in the coarse control loop has an up/down 
35 counter 13 0, the value of which represents the charge on the loop filter 
cap. This counter 13 0 is slowly decremented to represent leakage. The 
voltage comparator is high or low depending on whether the fine control 
voltage is operating in the upper or the lower half of its range. To 
balance the leakage, the control logic samples the comparator output. After 
40 multiple samples showing upper range operation, the up/down counter is 

incremented to represent adding charge to the loop filter cap. The DAC and 
low-pass filter convert the up/down counter output to a control voltage. 
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The coarse control loop is intended to compensate for manufacturing process 
variations and relatively low frequency but large changes due to power 
supply and temperature drift. 

5 The transmit phase buffer consists of circuits which are designed to 

interface to the pre-drive section of the PLL and provide only light loading 
to the PLL . The phase buffer then drives out to a latch providing the clock 
necessary for a full rate design. The phase buffer must also provide 
adequate rise and fall times, taking into account the estimated net 
10 loadings. 

The driver/equalizer consists of current -mode differential drive 
circuits which are controlled by a finite impulse response (FIR) type filter 
function. This filter is implemented by the combination of a shift register 

15 containing the current outgoing data bit and a history of three previous 
bits. This shift register, in turn, controls the activation of weighted 
current drivers . The output transfer function is of the general form of 
H(Z) = Ab 0 + Ab^' 1 + Ab 2 Z" 2 + Ab 3 Z" 3 wherein the values of the b n coefficients 
are negative. The numerical values of the coefficients are set by register 

20 values in the logic. The determining factors for the values of these 

coefficients include the characteristics of the transmission, media, the 
speed of transmission, the type of board connector used, the type of chip 
package, etc. The data bits are fed to the transmitter after the necessary 
conversion to the differential signal form and the powering up that is 

25 required to control the driver. 

The transmitter architecture is a multiplexing full-rate system. It 
is supported by three major analog blocks:...a full data rate PLL, a phase 
buffer to repower the PLL signal for the driver, and an off-chip driver with 

30 a built-in pre-emphasis equalization. In addition, there are specialized 
circuits for testing of the PLL. Figure 5 shows a block diagram of the 
transmitter architecture. A PLL 210 is controlling a four-stage ring 
oscillator 240 running at the full bit frequency. This PLL is shared by 
four transmitters. The phase outputs are used as local recovered clocks and 

35 to clock the FIR section of the driver. Word data (eight or ten bits) is 
clocked into a register synchronously with a word clock 242 generated from 
the PLL clock. The word data is transferred two bits at a time to a dibit 
data register 230 which is then loaded one bit at a time into the transmit 
data register. The final output is transferred at the full bit rate to the 

40 driver/equalizer block 226. The transmitter also contains a pseudo-random 
bit stream (PRBS) generator and checker 232, which- allows for self -testing 
in a wrap mode as well as link testing with a corresponding receiver. 
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in more detail, the transmitter structure takes eight or ten bit data 
from the dibit data register 230 and synchronously transfers the data two 
bits at a time through a bit selector 212 to a first holding latch 214 and a 
second holding latch 216. The bit selector 212 processes the two bit data 
pair by the least significant bit first, followed by the next lower 
significant bit. The counter 222 tracks the number of pairs that are 
processed and the order of each bit in the pair. When the counter senses 
that all bit -pairs have been serialized, the interfacing logic is notified 
to send another word for processing and the eight bit/ten bit register is 
clocked to latch the new data to be processed. The contents of the first 
and second latches 214, 216 are transferred in an alternating fashion under 
the control of dibit clock 224 to a dibit data register 218 and then to a 
single latch 220. A bit stream from this single latch 220 is transmitted 
to a driver/equalizer 226. This device takes the bit stream and creates a 
15 current -mode differential signal that is frequency equalized for the assumed 
• media channel. The equalization is a finite impulse response (FIR) 
pre-emphasis type using reduced current levels for longer run lengths. 



10 



20 



25 



The driver equalizer consists of current-mode differential drive 
circuits which are controlled by the FIR filter function commonly employed 
for this purpose. The filter is implemented by the combination of a shift 
register containing the current outgoing data bit and a history of the 
three most recent bits of outgoing data. The shift register, in turn, 
controls the activation of weighted current drivers. 



The receiver architecture or core is a three-fold oversampled 
half-rate system with a 54-step phase rotator, advanced digital bang-bang 
control circuit and an implementation of a sample processing algorithm that 
centers the static edge in the middle between two samples. The receiver 

3 0 takes a signal, such as an NRZ encoded baseband signal, from a serially 
wired transmitter and aligns the edges to determine where the signal 
switches between •l's and *0's. As with other signals, the problem is 
placing the center point between the edges. This is a achieved by sampling 
the signal and generating early or late signals based on whether the signal 

3 5 is being sampled too early or too late. When the frequency of the early 
signals is more than that of the late signals, the system drifts in the 
nearly' direction. Conversely, it drifts in the 'late' direction when the 
frequency of the late signals is greater that that of the 'early' signals. 

40 This present arrangement addresses the problem of incorrect decisions 

based on over- the -edge sampling by the use of over sampling which uses evenly 
spaced samples, but without placing a sample over the edge of the bit. 
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instead, this invention positions the samples so that no samples are on the 
bit edge, but instead samples are placed on either side of the edge. This 
method has a reduced probability of incorrectly predicting the position of 
the edge in the presence of random phase noise. This improvement directly 
5 affects BER (bit error rate) which is a primary goal of such systems. 

The data is oversampled and a digital circuit detects the edge 
position in the data stream. This digital circuit not only selects the 
optimum data sample, but also generates early and late signals, if the 

10 detected edge is not at its expected position. No signal is generated if no 
edge is found. Three or more evenly spaced samples make fewer errors on 
detection of edge because it is not centered on the data edge and is less 
likely to make consecutive incorrect decisions. The receiver architecture 
is a three-fold oversampled half-rate system with a 54-step phase rotator 

15 and an algorithm, such as an adaptive sample processing algorithm, centering 
the bit edge in the middle between two samples. 

A phase locked loop (PLL) controls a three-stage voltage controlled 
ring oscillator (VCO) running at half the bit frequency. Each stage 
20 includes a voltage controlled current source coupled to an n-type MOS 

(metal-oxide semiconductor) transistor. The current source is preferably a 
p-type MOS transistor. The oscillator is controlled by a voltage signal and 
by a current signal. 



25 



30 



Each PLL can be shared by multiple receivers. .The six phases from the 
VCO are fed into a phase rotator having 54 steps for a 2a interval. The 54 
steps are generated with a finite impulse response (FIR) phase rotator 
having six phases with three inter-slice phase steps. that are further 
divided by three. 



The six outputs of the rotator are buffered, and the edges are shaped 
to be able to sample a signal having twice the" frequency. One of the phase 
outputs is used as a local recovery clock. A clock buffer makes sure that 
it is not loading the phase rotator too much. Logic timing analysis 
35 determines which phase is the optimum to use as the local recovery clock. 
The output section of the phase rotator suppresses common mode signals and 
performs a limiting signal. 

The output is then driven out to the phase buffers (with the signals 
40 from the phase rotator) which, in turn, provides clocks. Six samples are 
taken over a two-bit interval. Three pipeline stages are added in order to 
reduce the probability of a metastable state to a value much lower than the 
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targeted bit error rate. The stages also help to align the data to one 
single clock phase. In order to be able to process information from more 
than one bit interval for the recovery of one data bit, a memory stage 
re -uses four samples from the previous sampling period. A total of 10 
5 samples are, therefore, fed into the half rate edge and data detection 
correlation blocks that make use of a pattern recognition algorithm. 

The outputs of the edge and data detectors are the recovered two bit 
and the early and late signals going to the phase rotator control state 
10 machine. A bang-bang control circuit with adaptive step size is used for 
this purpose. The rotator counter and temperature code generator generates 
the 54 control signals for the phase rotator and this closes the CDR loop. 

The data path consists of a shift register which loads two bits from 
15 the data correlation blocks during each half -rate cycle. The shift register 
is loaded to a word data register (8 or 10 bits) using a word clock derived 
from the PLL clock. 

The receiver architecture is supported by four major analog elements, 

2 0 a half data rate PLL, a phase rotator, a phase buffer and a sample latch. 

The function of each of these elements will be described in more detail 
hereinafter. The data interface for each receiver link comprises an output 
data bus, a mode control bit, and an output data clock. The mode control 
bit determines if the receiver core is operating on an 8-bit or a 10-bit 
25 transmitter output. 

The phase adjustment and clock recovery is done by a phase rotator, 
and hot by using a DLL or PLL control loop.. If there is more than one , 
analog PLL or DLL on one chip, these circuits tend to interact via supply 

3 0 and/ or substrate coupling. It would, therefore, be difficult to control 

their phases/delay in an analog fashion. The use of an independent PLL 
clock generator and an external phase rotator makes the system more immune 
to injected noise. The control of the phase shift is digital. The system 
operates at half the bit rate. For analysis of edges and data, however, 

35 three-bit intervals are used in order to have a half bit overlap on both 
sides. Some of the actual samples are reused in the analysis cycle 
described below. The rotator control state machine has a 'fly wheel' 
function. It monitors the phase update rate and applies an update even if 
no edge information is measured. This allows a TX to RX frequency offset 

40 even with a very long (>1024 bit) run length, if the jitter of the two 
clocks is small enough. 
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The receiver analyzes the oversampled data stream and generates two 
sets of correlation output signals, the detected bit values and the early 
and late signals, for an eventual update of the phase rotator. When the 
detected bit edge is centered between two samples, there is a 'dead-zone' in 
5 the CDR control loop if no jitter is present. With a jitter number larger 
than the sample spacing, the loop will average the detected sample crossings 
and will position the edge in the middle between two samples. This is a 
different situation than that found in a PLL phase detector with a dead 
zone, because the jitter is much larger and the phase control is digital 
10 with no leakage effects. The probability of generating a metastable 
sampling output is reduced for a middle edge position because the 
probability of an edge being positioned right on a sample is reduced. 

The receiver structure performs clock and data recovery (CDR) on the 

15 incoming serial data stream. The quality of this operation is a dominant 
factor for the bit error rate (BER) performance of the system. In order to 
overcome the drawbacks of the conventional methods, feed forward and 
feedback controls are combined in one receiver architecture. The data is 
oversampled and a digital circuit detects the edge position in the data 

20 stream. This digital circuit not only selects the optimum data sample, but 
also generates an early or late signal, if the detected edge is not at its 
expected position. No signal is generated if no edge is found. The phase 
rotator control state machine processes the early and late signals from the 
edge correlation outputs to control the output phase settings of a multi- 

25 phase PLL in a feedback loop. This feedback loop takes care of low 

frequency jitter phenomenon of unlimited amplitude, while the feed forward 
section suppresses high frequency jitter having limited amplitude. The 
static edge position is held at a constant position, in the .oversampled data 
array by a constant adjustment of the sampling phases with the early and 

3 0 late signals. 

In principle, the early/late signals can be used to directly control 
the output phase positions of a multiphase clock generator PLL. This would, 
however, dictate the use of one PLL per channel or receiver. If a phase 
35 rotator device is used to control the phase output of the clock generator, 
one PLL may be used for several receivers. 

Figure 3 shows a phase rotator 54 which is a building block that 
accepts several input phases from a multiphase half rate PLL 60 and performs 
40 a simultaneous shift of all phases by a fixed number of degrees. In one 
adjustment step, only a given predetermined phase step may be accomplished 
in order to guarantee that no glitch occurs. The overall phase shift is 
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unlimited (modulo 360 degrees) to allow » round-robin » operation. This 
building block is part of a clock/data recovery phase locked loop in the 
conventional sense. Receiver 50 takes transmitted data and forwards it to 
sample latches 52. The digital data and edge detector 58 and the selector 
62 select the optimal sample from the available samples to send to the 
deserializing shift register 64. The sample is then transferred to 8/10 bit 
data register 68. The counter 64 provides overall clocking of fractional 
rate logic within the design. In other words, it divides the half rate 
clock coming out of the PLL 60, and produces a quarter rate clock, as well 
as an eighth rate clock and a tenth rate clock 



Figure 6 illustrates in greater detail a block diagram of the receiver 
architecture of the present invention. A phase locked loop (PLL) 310 
receives a signal from a reference clock 308. The PLL includes and controls 
15 a voltage controlled three-stage ring oscillator (VCO) running at half the 
bit frequency. This PLL 310 is shared by four receivers, one 316 being 
shown. The six phases from the VCO are fed into a phase rotator 312 having 
54 steps for a 2d interval. The 54 steps are generated with a finite 
impulse response (FIR) phase rotator having six phases with three 
inter- slice phase steps that are further divided by three. 



20 



The six outputs of the rotator 312 are buffered, and the edges are 
shaped to be able to sample a signal having twice the frequency. One of the 
phase outputs is used as local recovered clock 314. A clock buffer (not 
25 shown) makes sure that it is not loading the phase rotator too much. Timing 
analysis determines which phase is the optimum to use. The output section 
of the phase rotator suppresses common mode signals and performs a limiting 
signal . .... ..... ..: .: 



30 



The output is then driven out (with the signals from the phase 
rotator) to the phase buffers and to a sample latch complex 318 which 
samples the incoming data. Six samples are taken over a two-bit interval. 
The sample latch complex is a CMOS, positive edge triggered latch. It takes 
differential data inputs and a single ended clock, and outputs a single 
35 ended, logic level signal. The complex consists of two circuits, the latch 
itself and a buffer that sharpens the output to the receive logic. The 
retiming latches 320 typically have a multiplexor (not shown) in front of 
them. This allows the latches to receive either sample latches 318 or input 
from the PRBS depending on whether data is being received from the receiver 
40 path or from the PRBS register 360. The pipeline stages from the PRBS 
register 360 reduce the probability of a metastable state to a value much 
lower than the targeted bit error rate. The retiming latches 320 also help 
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to align the data to one single clock phase. In order to be able to process 
information from more than one bit interval for the recovery of one data 
bit, a memory stage 322 reuses four samples from the previous sampling 
period. A total of 10 samples is, therefore, fed into two half rate edge 
5 and two data detection correlation decoders 324, 326, 328, 330 that make use 
of a pattern recognition algorithm. Truth Table 3 represents the initial 
best guess for the data. 

The outputs of the edge and data detector are the recovered two bits 
10 and. the early and late signals going to the phase rotator control state 
machine 340. This involves the use of a bang-bang control circuit with 
adaptive step si2e. The state machine 342 can be viewed as a digital filter 
that evaluates the early and late signals and commands an adjustment of the 
sample point. The rotator counter 342 and temperature code generator 33 4 
15 generate the 54 control signals for the phase rotator, and this closes the 
CDR loop. 

The data path includes of a shift register 350 which loads two bits 
from the data correlation blocks during each half -rate cycle. The shift 
20 register is loaded to a word data register 352 (8 or 10 bits) using a word 
clock derived from the PLL clock. A rate counter 354 controls the shift 
register 350 and the 8/10 bit register 352. 

The receiver also contains a pseudo-random bit stream (PRBS) generator 
25 and checker (shown within dotted lines 376) which allows for self-testing 
in a wrap mode as well as link testing with a corresponding transmitter. A 
built in self test is designed for use in receive loop-back mode.' This 
involves a linear feedback shift register (LFSR) 372a which -generates a 
random pattern code sequence. In this mode, the logic within the receiver 
30 core injects the generated code sequence into the first stage of the 
receive logic, monitors the deserialized receive data, synchronizes the 
receive data to the code sequence and verifies that a matching code sequence 
generated by a second LFSR 372b has arrived at the receiver output. The 
patterns are compared using an XOR 374. This serves to monitor and control 
35 the performance of the phase rotator 312. Both of the LFSRs 372a, 372b are 
part of the PRBS function. 

The receiver circuit is of a differential type containing fixed input 
bias (for power savings) which translates the input signal to that 
40 compatible with a high speed differential latch. The output circuits are 
powered-up to support the necessary loading from the latches and wiring. 
The receiver phase locked loop (PLL) is the clock source for oversampling 



WO 02/058355 



PCT/GB02/00128 



16 



the receive data and runs at half the data rate. It typically has a given 
operating range from e.g. 1.0625 Gbps to 1.5625 Gbps . A frequency reference 
is required which is one-half the target data rate. For example, 625 Mhz is 
required for an operational data rate of 1.25 Gbps. Six clock phases are 
5 buffered and brought out of the PLL and are intended to drive into a phase 
rotator circuit . 

The PLL contains a three-stage voltage controlled ring oscillator, a 
2X frequency divider, phase- frequency detector, charge pump and loop filter. 

10 These elements form the "fine" control loop. The VCO has both a "fine" and 
"coarse" control voltage in order to minimize the required gain of the fine 
loop. In addition to the fine control loop elements, the PLL contains a 
reference generator, a voltage comparator, PLL control logic, a digital to 
analog converter (DAC) and a low-pass filter. These elements form the 

15 "coarse" control loop. 

The fine control loop is a conventional analog loop and is intended to 
provide a stable low-noise low- jitter clock source for the receiver. The 
range, gain and bandwidth of the loop are designed to compensate for 
2 0 relatively high frequency but small perturbations due to power supply 
changes and the coarse loop. 

The coarse control loop is a digital representation of a conventional 
analog control loop based on a 'leaky' loop filter capacitor. That type of 

25 loop relies on leakage from the loop filter cap to drive the control voltage 
in a particular direction regardless of the frequency of the VCO . This 
leakage is compensated by a phase detector and charge pump that only 
increase the charge on the cap. The loop is stable when the charge being . .. 
added to the cap balances the charge that is leaking. The PLL control logic 

30 in the coarse control loop has an up/down counter whose value represents the 
charge on a loop filter cap. This counter is slowly decremented to 
represent leakage. The voltage comparator is high or low depending on 
whether the fine control voltage is operating in the upper or lower half of 
its range. To balance the leakage, the control logic samples the comparator 

35 output. After multiple samples showing upper range operation, the up/down 
counter is incremented to represent adding charge to the loop filter cap. 
The up/down counter output is converted to a control voltage by the DAC and 
low-pass filter. The coarse control loop is intended to compensate for 
manufacturing process and relatively low frequency but large changes due to 

40 power supply and temperature drift. 



WO 02/058355 



PCT/GB02/00128 



17 



The phase rotator is an analog circuit and, as such, is a device 
allowing a step by step, glitch-free modulo shift of all n phases of the 
ring oscillator at the input to any phase angle at the output. The modulo 
option is guaranteeing phase and frequency compensation capability, the 
5 glitch-free performance assures that no bits are lost during rotation, and 
'step by step 1 means that the amount of phase change is limited to one phase 
slice for each clock cycle. 

The concept of the phase rotator is based on finite impulse response 
10 (FIR) filter principles. A ring oscillator may be seen as a circular array 
of delay elements. By multiplying the outputs t n of the array with 
weighting factors m„ and summing the values, an FIR filter is built. The 
number of taps determine the amount of oversampling and, therefore, the 
order of an analog filter required for alias filtering. If the weighting 
15 factors may be changed dynamically, the FIR filter response may be changed 
'on the fly 1 . This allows the dynamic adjustment of the output phase of 
such a filter. 

The following illustration shows the phase rotator principle looking 
20 at one of its outputs when using a ring oscillator with six phases as a 
driving device. In this illustration, there are nine different weighting 
factors mO to m8 available. Each of these numbers is built by summing some 
of the sub-factors wl to w8 . Table 1 shows the composition of mO to m8 from 
the sub-factors wl to w8 . The sub-factors may be implemented in a very 
25 simple way using parallel connected transistors with width ratios wl to w8 . 
The summed output current of these transistors corresponds to a weighting 
factor m n . Only one sub-factor is added or subtracted at a time. An analog 

implementation of a summation (current summing) is not subject to any 

glitch. This would not be the case for an analog multiplication. 

30 

TABLE 1 



Weighting factor... 


...built from... 


mO 


= 0 (not used in initial configuration) 


ml 


= wl 


m2 


= wl + w2 


m3 


= wl + w2 + w3 


m4 


= wl + W2 + W3 + W4 
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m5 


=3 wl + w2 + w3 + w4 + w5 


m6 


= wl + w2 + w3 + w4 + w5 +w6 


m7 


= wl + w2 + w3 + W4 + W5 +W6 + W7 


TT\8 


= wl + w2 + w3 + w4 + w5 +w6 + w7 + w8 



The stepwise change of the output phase occurs by sequentially 
5 changing the weighting factors that determine the contribution from each 

phase tap to the actual output. For a 'clever' setting of the weight values 
wl to w8, this will shift the output phase by exactly one-ninth of a phase 
slice. After the last rotating step, all weights have been shifted by one 
tap position. This corresponds to a shift of one phase slice at the output 
10 of the FIR. 

By repetition of the above sequence, any phase setting may be tuned 
in. Because this is a circular operation, the range of the output phase is 
not limited to the 0 to 360 degree interval. This allows a continuous 
15 variation of the phase and thereby a frequency adjustment. Due to the fact 
that the weighting factors are only changed by adding or subtracting one 
sub -factor element at a time, no glitches can occur. 

Each FIR coefficient cl to c6 is controlled by a temperature code that 
20 determines whether a sub-factor is 'on' or 'off. The temperature codes 

controlling the sub- factors for one phase step of a six-phase oscillator are 
given in Table 2. It may be seen that after nine steps, the codes are 
modulo shifted to the right by one coefficient position and, therefore, by 
one oscillator phase. The basic phase granularity of the oscillator (360 
25 degrees divided by the number of oscillator phases) is divided by a factor 
of nine in this case. This is a big advantage because it will result in a 
lower static phase error. 
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TABLE 2 

Temperature codes controlling which sub- factors are summed to form the 
actual coefficients cO to c5 (two phase shifts) 



5 



Step 


cO 


cl 


c2 


c3 


c4 


c5 


0 


000 000 111 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


000 000 111 


1 


000 000 011 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


000 001 111 


2 


000 000 001 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


000 Oil 111 


3 


000 000 000 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


000 111 111 


4 


000 000 000 


000 111 111 


Oil 111 111 


111 111 111 


001 111 111 


000 111 111 


5 


000 000 000 


000 111 111 


001 111 111 


111 111 111 


011 111 111 


000 111 111 


6 


000 000 000 


000 111 111 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


7 


000 000 001 


000 011 111 


000 111 111 


111 111 111 


in ill in 


000 111 111 


S 


000 000 Oil 


000 001 111 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


9 


000 000 111 


000 000 111 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


10 


000 001 111 


000 000 011 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


11 


000 011 111 


000 000 001 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


12 


000 111 111 


000 000 000 


000 111 111 


111 111 111 


111 111 111 


000 111 111 


13 


000 111 111 


000 000 000 


000 111 111 


Oil 111 111 


111 111 111 


001 111 111 


14 


000 111 111 


000 000 000 


000 111 111 


001 111 111 


111 111 111 


011 111 111 


15 


000 111 111 


000 000 000 


000 111 111 


000 111 111 


111 111 111 


111 111 111 


16 


000 111 111 


000 000 001 


000 011 111 


000 111 111 


111 111 111 


111 111 111 


17 


000 111 111 


000 000 011 


000 001 111 


000 111 111 


111 111 111 


111 111 111 


18 


000 111 111 


000 000 111 


000 000 111 


000 111 111 


111 111 111 


111 111 111 



It is understood that this table shows 18 steps for two phases of the 
oscillator whereas a total of 54 steps is required- for all six phases. The.'., 
code for the remaining 36 steps can readily be determined from the pattern 

10 of the 18 steps shown on the table. 

The receive phase buffers consist of circuits which are designed to 
interface to the output drive sections, (all phases) of the phase rotator 
circuit, while subjecting the phase rotator to only light loading. The 
phase buffers then drive from the phase rotator to a set of latches while 

15 providing the required input drive necessary for the phase rotator circuit. 
The receive phase buffers operate at a rate necessary for a half rate 
design. The phase buffers also provide adequate rise and fall times taking 
into account the estimated net loadings . 

2 0 The sample latches are fed data by the input receiver circuit, and 

obtain clocks from the combination of the PLL, phase rotator circuit, phase 
buffer complex, The data input to the sample latches is differential in 
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nature and, as such, the sample latches are pseudo analog circuits. The 
design of the input receiver and the sample latches are very closely 
coordinated to minimize the effects of noise on the jitter associated with 
these two circuits . Typically, the sample latch is a CMOS positive edge 
5 triggered latch. 

The method for the phase rotator control is an advanced bang-bang 
state machine with eight-fold initial early/late averaging, such as that 
shown in Figure 3. It has 16 states and may be implemented using four 
10 latches. The state machine 340 has two inputs, one for early and one for 
late. The early and late signals are a function of the input sample 
pattern. They are generated by use of an edge and data correlation table of 
the type shown in Table 3 . 

15 TABLE 3 



Full Rate Patterns for Early and Late Signals 



pattern EL 


pattern EL 


pattern EL 


pattern EL 


0000000 


00 


G 


1111111 


00 


G 


0100101 


00 




1011010 


00 




0000001 


01 


* 


1111110 


01 


* 


1010010 


00 




0101101 


00 




1000000 


10 


* 


0111111 


10 


* 


0110010 


00 




1001101 


00 




0000010 


00 




1111101 


00 




0100110 


00 




1011001 


00 




0100000 


00 




1011111 


00 




0100111 


00 




1011000 


00 




0000011 


00 


G 


1111100 


00 


G 


1110010 


00 




0001101 


00 




1100000 


00 


G 


0011111 


00 


G 


0101001 


00 




1010110 


00 




0000100 


00 




1111011 


00 




1001010 


00 




0110101 


00 




0010000 


00 




1101111 


00 




0101010 


00 




1010101 


00 




0000101 


00 




1111010 


00 




0101011 


00 




1010100 


00 




1010000 


00 




0101111 


00 




1101010 


00 




0010101 


00 




0000110 


10 


7 


1111001 


10 




0110001 


01 




1001110 


01 


? 


0110000 


01 


7 


1001111 


01 


7 


1000110 


10 




0111001 


10 


? 


0000111 


10 


* 


1111000 


10 


* 


0110011 


00 




1001100 


00 




1110000 


01 


* 


0001111 


01 


* 


1100110 


00 




0011001 


00 




0001001 


00 




1110110 


00 




1000001 


00 




omiio 


00 




1001000 


00 




0110111 


00 




1000011 


00 


G 


0111100 


00 


G 


0001010 


00 




1110101 


00 




1100001 


00 


G 


0011110 


00 


G 


0101000 


00 




1010111 


00 




1000101 


00 




0111010 


00 




0001011 


00 




1110100 


00 




1010001 


00 




0101110 


00 




1101000 


00 




0010111 


00 




1000111 


10 




0111000 


10 


* 


0010001 


00 




1101110 


00 




' 1110001 


01 


* 


0001110 


01 


* 


1000100 


00 




0111011 


00 




1001001 


00 




0110110 


00 




0010010 


00 




1101101 


00 




1001011 


00 




0110100 


00 




0100100 


00 




1011011 


00 




1101001 


00 




0010110 


00 




0010011 


00 




1101100 


00 




1010011 


00 




0101100 


00 




1100100 


00 




0011011 


00 




1100101 


00 




0011010 


00 




0100001 


00 




1011110 


00 




1100011 


00 


G 


0011100 


00 


G 


1000010 


00 




0111101 


00 




1100111 


10 


? 


0011000 


10 


? 


0100010 


00 


G 


1011101 


00 


G 


1110011 


01 


? 


0001100 


01 




0100011 


00 


G 


1011100 


00 


G 


1101011 


00 




0010100 


00 




1100010 


00 


G 


0011101 


00 


G 


1110111 


00 




0001000 


00 
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G = Good No Change 

? = Probable Need to Move 



* = Clear Need to Move 

- = Not Enough Information 
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The method for the phase rotator control is an advanced bang-bang 
state machine. As seen at 342 in Figure 7 it involves eight-fold initial 
early/late averaging. It has sixteen states and may be implemented using 
5 four latches. Referring again to Figure 6, the state machine 342 has two 
inputs, one for early and one for late*. The averaging effect is achieved in 
the following manner. The state machine 342 is set to 8, If several early 
signals in a row, but not enough to drive the state to *1* , are followed by 
several late signals, the state machine averages them out. However, when a 

10 preponderance of early or late signals takes the state machine to or 

*14', the state machine determines that the sampling is occurring too early 
or too late and determines whether to change the sample point. The state 
machine produces a 'late' signal when it gets to a state '1', and an v early' 
signal when it gets to a state *14' . This output signal from the state 

15 machine, if it is a x late' signal, instructs the rotation counter to 

adjust the sampling to a later point. Conversely, an 4 early' signal will 
instruct the counter to adjust the sampling to an earlier point. 

Figure 8 shows the operation of a second embodiment of the state 
20 machine. This machine 380 combines early/late averaging with adaptive 
behavior that changes the amount of averaging based on the number of 
consecutive early or late inputs . When a sustained sequence of early or 
late inputs is received, this state machine reduces the amount of averaging 
in order to increase the stepping rate for the phase rotator. This state 
25 machine contains 64 states and requires six latches. As with the prior 

embodiment, the state machine is followed by an x up and down' counter with 
54. steps requiring six flip-flops. The counter has 54 steps, and controls 
where the sample point will be. The counter processes two bits at a time in 
parallel. Thus, there are 27 positions where the sample point can be set 
30 for each bit. That defines the limits of the resolution. As noted, the 

state machine determines whether to change the sample point and the counter 
determines where the new sample point will be. 

Referring now to Figure 9, a block diagram of a transmitter analog 
3 5 architecture 410 is shown. The transmitter architecture 410 is supported by 
three major analog blocks: a full data rate phase locked loop (PLL) 412, a 
Phase Buffer circuit 414 to repower the PLL signal, and an off -chip Finite 
Impulse Response (FIR) equalization driver circuit 416. Within the PLL 412 
are a "fine" control loop circuit 427 and a "coarse" control loop. 

40 

The transmitter PLL 412 is the clock source for the transmitted data 
and preferably runs at the full data rate. At full rate, less duty cycle 
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distortion and jitter occur, and the present embodiment of the invention is 
able to run at full rate efficiently. A frequency reference is l/nth target 
data rate. For example for n=4, 625 Mhz is required for an operational data 
rate of 2.5 Gbps . A single clock phase is buffered and brought out of the 
5 PLL 412 and is intended to drive into the Phase Buffer circuit 414. 

The PLL 412 illustrated contains a multi-stage, voltage controlled 
ring oscillator (VCO) 418, a frequency divider 420, phase- frequency detector 
422, charge pump 424 and multi-pole "ripple capacitor" loop filter 426. 

10 These elements form a "fine" control loop 427. Although, in the embodiment 
described herein, the VCO 418 is a four- stage oscillator and the frequency 
divider 420 is a four-times divider, other stage and divider multiples will 
be apparent to one skilled in the art, and the loop is not limited to the 
specific four-stage oscillator and four-times divider elements described, 

15 The fine control loop 427 is a conventional analog loop and is intended to 
provide a stable low-noise low- jitter clock source for the transmitter 
circuit 410. The range, gain and bandwidth of the loop 427 is designed to 
compensate for relatively high frequency but small perturbations due to 
power supply changes and the coarse loop . 

20 

Referring now to Figure 10, a schematic of one embodiment of the loop 
filter 426 is provided. The loop filter circuit 426 illustrated is a second 
order CRC low pass filter. A small "ripple" capacitor 428 is used to 
attenuate charge pump ripple, and a larger "loop filter" capacitor 430 is 

25 used to stabilize the circuit and set the dominant pole. The loop filter 
circuit 42 6 converts the charge pump current received from the charge pump 
424 into a control voltage that drives the VCO circuit 418. Resistors 432 

• - -add- a zero into the circuit to null out the affect of the pole at -the origin 
(caused by the VCO 418) The loop filter circuit 426 also sets the dominant 

30 pole of the circuit. The ripple capacitor 428 is much smaller than the loop 
filter capacitor 430. This keeps its pole much further out in the 
frequency. The resistors 432 also factor into the open loop gain which 
comes into play for the stability of the system and the settling time (or 
response time of the circuit) . Although, in the embodiment illustrated, the 

35 VCO circuit 418 gain ranges from 3 00MHz to 3.8GHz depending upon process and 
temperature, other gain values may be achieved, as will be readily apparent 
to one skilled in the art. Accordingly, the resistors 432 are switchable. 
A switch 433 is controlled by logic based on the operation of the PLL 
circuit 412, said logic preferably setting a range between 2.5GHz and 

40 3.12 5GHz in the current embodiment. Other embodiments (not shown) may have 
a value range greater or smaller, or covering a different value range/ the 
range described is for illustrative purposes only. The VCO 418 has both a 
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i 

"fine" and "coarse" control voltage in order to minimize the required gain 
of the fine loop 427. 

Referring now to Figure 11, a schematic of a four- stage delay cell 
5 embodiment of the transmitter VCO 418 is provided. The VCO 418 itself is of 
a form which adjusts the speed of oscillation by adjusting local feedback 
within a plurality of delay cells 440, as well as controlling feedback 
within the VCO 418, which provides pre-charge of the delay cells 440 for 
speed enhancement. It is preferred that the VCO operate at 2.125GHz to 
10 3.125GHz across a defined range of operating conditions and produce a 

differential clock output. Other embodiments (not shown) may have a value 
range greater or smaller, or covering a different value range; the range 
described is for illustrative purposes. 

15 In a conventional ring oscillator, the oscillation frequency is 

determined as 1/ (2N6) , where N is the number of stages and o is the unit 
delay time of a delay cell. Hence, the frequency of oscillation is decided 
by the delay time of one delay element. Higher operation frequency and 
wider tuning range are achieved in the embodiment invention illustrated in 

2 0 Figure 11 by implementing a dual delay scheme. Dual-delay means that both 

negative skewed delay paths 434 and normal delay paths 436 exist in the same 
oscillator. (In Figure 11 the negative skewed delay paths 434 are 
represented by normal lines, and the normal delay paths 43 6 as thicker 
boldface lines.) The negative skewed delay paths 434 decrease the unit 
25 delay time below that of a single inverter delay time. As a result, a 

higher operating frequency can be obtained. Since the normal delay paths 
436 also exist, the frequency range of the VCO 418 can be wider than that of 
- an oscillator with only skewed delay paths. 

3 0 Referring now to Figure 12, a schematic of a VCO 418 transmit delay 

cell 440 is provided. It is preferred that the delay cell 440 be tunable 
from 80 ps to 125 ps delay over the VCO 418 operating range. Other 
embodiments (not shown) may have a value range greater or smaller, or 
covering a different value range; the range described is for illustrative 

3 5 purposes only. It is also preferred that the delay cell 440 produce full 
swing differential outputs. At the core of the delay cell 440 is an NMOS 
differential pair (T0,T2) 442 with a PMOS pair latch (T4,T5) 444 as an 
active load. Cross-coupled NMOS transistors (T1,T3) 446 control the maximum 
gate voltage of a pair of PMOS load transistors 448 and limit the strength 

40 of the PMOS latch 444. When the control voltage is low, the strength of the 
latch 444 becomes weak, and the output driving current of the PMOS latch 444 
load increases. Therefore, the state of the latch 444 is changed easily and 
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the delay time is reduced. Thus, when the control voltage is high, the 
latch 444 becomes strong, and it resists the voltage switching in the 
differential delay cell 440. As a result, the delay time increases. With 
the help of the positive feedback of the latch 444, the transition edges of 
5 the output waveform remain sharp in spite of slow delay time. Since the 
delay cell 440 is basically a simple differential inverter, a full-swing 
waveform is generated. 



To utilize both negative skewed and normal delay paths, the pair of 
10 PMOS transistors (T6, T7) 448 are added to the PMOS loads of the delay cell 
44 0 and are used to take the negative skewed signals. The negative skewed 
signal is connected to the PMOS input of the delay cell 440 and the normal 
signal is connected to the NMOS input of the delay cell. The negative 
skewed signal is taken from the two stages before the current delay stage. 
15 The signal prematurely turns on the PMOS during the output transition and 
compensates for the performance of the PMOS, which is usually slower than 
that of the NMOS. 



A second pair of NMOS transistors (T8,T9) 450 is inserted in shunt 
20 with the original NMOS cross coupled pair 446. These devices are smaller 

and longer and, therefore, have less effect on performance. This allows for 
a "fine" control of the delay cell. 



Referring again to Figure 9, in addition to the fine control loop 427 
25 elements, the PLL 412 contains a reference generator 460, a voltage 

comparator 462, PLL control logic 464, a Digital to Analog Converter (DAC) 
466 and a low-pass filter 468. These elements form the digital "coarse" 
- control loop. This digital coarse loop is used to compensate for process and- 
temperature to put the VCO 418 in the correct operating range. Although the 
3 0 embodiment of the PLL 412 described thus far is a dual loop PLL having both 
"fine" and "coarse" loops, alternative embodiments may utilize only one 
loop, and a dual loop PLL structure is- not required for this structure. The 
analog fine loop 427 is then able to lock to the reference clock and produce 
a preferred stable 2 . 125GHz/3 . 125GHz clock. Other embodiments (not shown) 
35 may have different clock values, and the values described are for 

illustrative purposes only. It is preferred that the reference level for 
the comparator 462 is produced by a cbias circuit 411. 

The coarse control loop is a digital representation of a conventional 
40 analog control loop based on a "leaky" loop filter capacitor. That type of 
loop relies on leakage from the loop filter circuit 426 to drive the control 
voltage in a particular direction regardless of the frequency of the VCO 
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418. This leakage is compensated by the phase detector 422 and charge pump 
424, which only increase the charge on the loop filter circuit 426. The 
loop is stable when the charge added to the loop filter circuit 42 6 balances 
the charge that is leaking. 

5 

The PLL control logic 464 in the coarse control loop has an up/down 
counter (not shown) whose value represents the charge on the loop filter 
circuit 426. This counter is slowly decremented to represent leakage. The- 
voltage comparator 462 is high or low depending on whether the fine control 

10 voltage is operating in the upper or lower half of its range. To balance 
the leakage, the control logic 464 samples the comparator 462 output. After 
multiple samples showing upper range operation, the up/down counter (not 
shown) is incremented to represent adding charge to the loop filter circuit 
426, The up/down counter (not shown) output is converted to a control 

15 voltage by the DAC 466 and low-pass filter 468. The coarse control loop is 
intended to compensate for manufacturing process and relatively low 
frequency but large changes due to power supply and temperature drift. It 
is discussed more thoroughly in relation to Figures 4 to 8 . 

20 Figure 20 is block diagram of another embodiment of a dual loop PLL. 

From PLL theory, it is known that for good phase noise/jitter performance, 
the tuning sensitivity and the multiplication factor should be small. As a 
potential solution to these problems, a two- stage reference frequency 
multiplication is suggested with an external loop filter 712 and LC 

25 oscillator 714 in the first stage and a dual loop on-chip PLL 710 in the 
second stage. The first loop filter 716 has a narrow bandwidth, eventually 
allowing to meet jitter transfer requirements. The phase noise/ jitter 
performance should be dominated by the quality of the external VCO and may 
by specified or selected by the customer. The second PLL loop filter (not 

3 0 shown) is as large as possible to suppress any ring oscillator noise. This 
is intended to allow a tracking of the performance of the 625 MHz signal 
from the first loop, dominating the overall jitter performance. 

Figure 21 is a block diagram of the coarse frequency control loop 72 0 
35 of Figure 20. The basic idea is to introduce a controlled amount of digital 
leakage into one frequency direction. The voltage of the fine tune input is 
sampled and, if a predefined level is crossed, the coarse voltage is 
digitally adjusted with a D/A converter 722. With this approach, the loop 
gain in one direction is essentially zero. This breaks the loop and 
40 guarantees stability. A digital integrator (counter) 724 realizes a low 
pass function for improved switching noise. 
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Referring again to Figure 9, a Phase Buffer circuit 414 comprises 
phase pre-drive circuits 470, phase buffer/delay circuits 472 and a transmit 
phase buffer latch 474. The phase buffers 472 drive out to the latch 474 
and thereby provides the clock necessary for the full rate design of the 
5 present embodiment. The phase buffers 472 must also provide adequate rise 
and fall times taking into account the estimated net loading. 

The phase buffers 472 may comprise any circuits that drive clocks from 
sources to circuits that have high capacitive loading due to wiring and/or 
0 gate loading. At the clock rates used in the present invention, phase 

buffers 472 are important in assuring reasonable rise and fall times, duty 
cycle, and jitter performance of system clocks. The phase buffers 472 are 
described in more detail later in this specification in the description of 
the receiver PLL circuitry. 

One embodiment of an equalization driver circuit 416 is illustrated in 
Figure 9. The equalization driver circuit 416 is a Finite Impulse Response 
(FIR) equalization driver comprising current -mode differential drive 
circuits that are controlled by a FIR- type filter function. It is preferred 
to equalize the transmitter data stream as a means of minimizing the amount 
of inter-symbol interference created by copper skin effect and circuit card 
dissipation factor; the former related to the root of the operating 
frequency, the latter related in a linear manner to the operating frequency. 
The transmitter FIR circuit 416 is described in detail in the related U.S. 
patent application entitled "Programmable Driver/Equalizer with Alterable 
Analog Finite Impulse Response (FIR) Filter Having Low Inter symbol 
Interference & Constant Peak Amplitude Independent of Coefficient Settings" 
(Docket No. RAL920000097TJS1) , Serial No. 09/749908, filed December 29, 2000, 
incorporated by reference herein. Other types of equalization driver 
circuits may be used and the driver circuit described is for illustrative 
purposes only. 

Referring now to Figure 13, a block diagram of receiver analog 
architecture 500 is shown, comprising a half-data rate PLL circuit 501 and 
an analog receiver circuit block 502. The analog receiver circuit block 502 
comprises a Phase Pre-Drive 504, Phase Rotator Circuits 506 and associated 
phase rotator bias circuits 507, a Phase Buffer circuit 508 to repower the 
PLL signals, six sampling latches 510, and latch buffer 512 driving receiver 
logic 513. Providing six latches allows the circuit to have three samples 
per bit of data for a half -data rate. The sampling latches 510 are also 
interfaced with a receiver circuit 514 that is a differential type 
containing fixed input bias 516 (for power savings) which translates the 
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input signal to that compatible with a high speed differential latch. The 
output circuits are powered-up to support the necessary loading from the 
latches and wiring. 

5 An embodiment of the receiver circuit 514 is illustrated in Figure 14. 

It is designed to supply a required differential output voltage to six 
sample latches from an input differential voltage bitstream operating at 
2.5Gb/s. The preferred requirements for the receiver circuit 514 are noted 
in Table 4 below. The measured results were taken at the operating 
10 condition that yielded the worst performance with 150mVP-P additional noise 
on VDD. All results are on a per-link basis for the fully extracted 
receiver. It is to be understood that other embodiments (not shown) may 
have different requirements, and the values described are for illustrative 
purposes only. 

15 

Table 4: Receiver Circuit Specifications 



Specification 


Requirement 


Measured 


Operating Cond. 


Maximum Current 


6mA 


6 . 6mA 


1.98V, 25°C, ASICBC 


Jitter from Power 
Supply Noise and 
Process 
Limitations 


13ps 


24 . 6ps 


1.62V, 125°C / ASICWC 


Minimum 
Differential P-P 
Input 


lOOmV 


lOOmV 


1.62V, 125°C, ASICWC 


Minimum 
Differential P-P 
Output 


800mV 


858mV 


1.62V, 125°C, ASICWC 


Output Common Mode 


0.9V-1.3V 


■ 0.95V-1.2V 


all conditions 


Bandwidth 


not specified 


918MHz 


1.62V, 125 0 C, ASICWC 


DC Gain 


not specified 


10.5 


1.62V, 125°C, ASICWC 


Input Common Mode 
Range 


not specified 


0.6V-1.6V 


1.62V, 125°C, ASICWC 



Receiver circuit 514 is comprised of a bias network and two 
20 differential amplifiers 520. A CBIAS cell 522 provides a DC reference 
voltage for a PMOS transistor 524 that is then converted to a reference 
voltage for an NMOS transistor 526. Two stages of amplification were chosen 
to try to maximize gain and bandwidth; however, the invention is not limited 
to two stages. 

25 

Figure 15 is a schematic view of the differential amplifier 520 of 
Figure 14. It is a traditional design with an NMOS tail current and 
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resistive loading to give the necessary bandwidth. The NMOS tail 531 
mirrors off the lOOuA CBIAS current to provide approximately 3mA to the 
diff-pair 532. This 3mA is based on the maximum allowable current for the 
receiver. The size of the resistors 530 was chosen to provide the necessary 
5 output common mode voltage based on the 1.5mA pulled through each. The 
input transistors 532 were then sized to achieve a gain of approximately 
20dB. 

Figure 8 is a schematic diagram of an exemplary sampling latch 510 
10 referred to by Figure 13. The sample latches 510 are fed data by the input 
receiver circuit 514 and obtain clocks from the combination of the PLL 
circuit 501, phase rotator circuit 506 and phase buffer complex 508. The 
data input to the sample latches 510 is differential in nature and, as such, 
the sample latches 510 are pseudo analog circuits. It is important that the 
15 design of the input receiver and the sample latches be very closely 

coordinated to minimize the effects of noise on the jitter associated with 
these two circuits. 

The latch 510 illustrated in Figure 16 is a CMOS, positive edge 
20 triggered latch circuit. It takes differential data inputs and single ended 
clock and outputs a single ended, logic level signal. The complex consists 
of two circuits, the latch 540 itself and a buffer 542 that sharpens the 
output of the latch 540. The latch 540 receives its differential data from 
the receiver circuits 514 and performs differential or single ended 
25 conversion to it and drives the output to the receive logic 513. 

With CLK-Q delay <300ps (nominal) and a sample and hold window <35ps 

as performance boundaries, an embodiment of the latch circuit 510 

illustrated in Figure 16 was simulated over various process, temperature and 

3 0 supply conditions with varying loads. The appropriate parameters were 
measured to ensure adequate performance over these conditions. Also, 
simulations were performed to determine the setup and hold window, the 
me ta- stability window, and the jitter performance of the latch 510. The 
following Table 5 demonstrates various performance parameters of the latch 

35 circuit 510. 
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Table 5: Latch Operating Parameters 



Operating Conditions 


CLK-Q delay ps 


tr 

ps 


tf 

ps 


TT . T= 5 0 , VDD=1 . 8 , Load=3 Of F, 
Nominal CLK 


187 


37 


34 


ASICWC, T=125C, VDD«1.62, 
Load«40fF, Slow CLK 


297 


56 


52 


ASICBC, T-25C, VDD=1.98, 
Load-20fF, Fast CLK 


129 


29 


26 



5 The sampling latch circuit 510 has a negative setup and hold window. 

It was measured with respect to the output of the latches 510 (and not with 
respect to the output of the latch buffer 512) . Any CLK-data delay that 
result in more than 30 Ops CLK-Q delay was also included in this window 
calculation. The preferred sample and hold window for this latch is lOps. 

10 

Referring again to Figure 13, the receiver PLL circuit 501 is the 
clock source for oversampling the receive data and runs at half the data 
rate. A frequency reference is required which is l/nth target data rate; 
for example, for n-2, 625 Mhz is required for an operational data rate of 
15 1.25 Gbps. Six clock phases are buffered and brought out of the PLL and are 
intended to drive into the Phase Rotator circuit 5 06. 

The receive PLL 501 of Figure 13 has a six- stage voltage controlled 
ring oscillator (VCO) 550, a 2X frequency divider 552, phase -frequency 

20 detector 554, charge pump 556 and multi-pole loop filter 558. These 
elements form the "fine" control loop. The receive VCO 550 has both a 
"fine" and "coarse" control voltage in order to minimize the required gain 
of the fine loop. In addition to the fine control loop elements, the 
receive PLL 501 contains a reference generator 560, a voltage comparator 

25 562, PLL control logic 564, a Digital to Analog Converter (DAC) 566 and a 
low-pass filter 568. These elements form the "coarse" control loop. 

The fine control loop 559 is a conventional analog loop and is 
intended to provide a stable low-noise low- jitter clock source for the 
30 receiver. The range, gain and bandwidth of the loop is designed to 

compensate for relatively high frequency but small perturbations due to 
power supply changes and the coarse loop. 

The coarse control loop is a digital representation of a conventional 
35 analog control loop based on a "leaky" loop filter capacitor. That type of 
loop relies on leakage from the "loop filter cap" to drive the control 
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voltage in a particular direction regardless of the frequency of the receive 
VCO 550. This leakage is compensated by the phase detector 554 and charge 
pump 556 that only increase the charge on the "cap." The loop is stable 
when the charge being added to the cap balances the charge that is leaking. 

5 

The receive PLL control logic 564 in the coarse control loop has an 
up/down counter (not shown) whose value represents the charge on a loop 
filter cap. This counter is slowly decremented to represent leakage. The 
voltage comparator 562 is high or low depending on whether the fine control 

10 voltage is operating in the upper or lower half of its range . To balance 
the leakage, the receive PLL control logic 564 samples the comparator 562 
output. After multiple samples showing upper range operation, the up/down 
counter is incremented to represent adding charge to the loop filter cap. 
The up/down counter output is converted to a control voltage by the DAC 566 

15 and low-pass filter 568. The coarse control loop is intended to compensate 
for manufacturing process and relatively low frequency but large changes due 
to power supply and temperature drift. 

It is preferred that the receive PLL 501 operate from about 1GHz to 
20 about 1.6GHz across a range of operating conditions, and that it produce six 
evenly spaced phases. The digital coarse loop is used to compensate for 
process and temperature to put the receive VCO 550 in the desired operating 
range. The lower bandwidth analog fine loop is then able to lock to the 
reference clock and produce six stable 1.0GHz to 1 . 6GHz phases. Other 
25 embodiments (not shown) may have a value range greater or smaller, or 

covering a different value range; the range described is for illustrative 
purposes only The reference level for the comparator 562 is produced by 
cbias (not shown) . 

30 Figure 17 is a schematic of a receive six-stage VCO 550 structure of 

Figure 13 with dual delay paths, comprising six delay cells 552. The 
function of the dual delay path oscillator has been previously discussed 
with respect to the transmit VCO 418 and delay cells 440. 

35 The phase rotator 506 of Figure 13 is an analog circuit and, as such, 

is a device allowing a step by step, glitch- free modulo shift of all n 
phases of the receive VCO 550 at the input to any phase angle at the output. 
The modulo option is guaranteeing phase and frequency compensation 
capability, the glitch-free performance assures that no bits are lost during 

40 rotation and ! step by step' means that the amount of phase change is limited 
to one phase slice for each clock cycle . 
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The concept of the phase rotator 506 is based on FIR filter 
principles. The receive VCO 550 may be seen as a circular array of delay 
elements. By multiplying the outputs t, n of the array with weighting 
factors m, n and summing the values, an FIR filter is built. The number of 
5 taps determine the amount of oversampling and, therefore, the order of an 
analog filter required for alias filtering. If the weighting factors may be 
changed dynamically, the FIR filter response may be changed 'on the fly'. 
This allows the dynamic adjustment of the output phase of such a filter. 



10 It is preferred that the phase rotator 506 receive all six phases from 

the receive VCO 550 and provide a step by step shift to all six phases to 
any of 54 possible phase angles at the output. Thus, it will rotate all six 
phases in 6.67 degree steps which, for a 2.5 Gbit system, corresponds to 
14.8ps. By taking specific weights of each phase, the phase rotator 506 

15 outputs 6 shifted phases. The phases are generated in differential pairs 
and then passed through three stages of phase buffers 508 before entering 
the sampling latches 510. Each phase rotator 506 is controlled by 54 lines 
from logic, which adjust the current weights for each phase contribution. 



20 The receive phase buffers 508 consist of circuits which are designed 

to interface to the output drive sections (all phases) of the phase rotator 
circuit 506 while subjecting the phase rotator 506 to only light loading. 
The phase buffers 508 then drive from the phase rotator 506 to the sampling 
latches 510 while providing the required input drive necessary for the phase 

25 rotator circuit 506. It is preferred that the receive phase buffers 508 

operate at a rate necessary for a half rate design. It is also preferred * 
that the phase buffers 508 provide adequate rise and fall times taking into 
account- the. estimated net loading. 

3 0 The receive phase buffers 508 may include any circuits that drive 

clocks from sources to circuits that have high capacitive loading due to 
wiring and/or gate loading. For the receive PLL 501, it is preferred that 
the phase buffers 50S allow equal loading on the individual delay stages, 
and the drive capability to fan out the clock phases from a single PLL to 

35 four transmit/ receive cores. At the clock rates used in the present 

embodiment, phase buffers 508 are important in assuring reasonable rise and 
fall times, duty cycle, and jitter performance of system clocks. 



A preferred embodiment utilizes two phase buffer 508 circuit 
40 topologies. The first is a pseudo-differential positive feedback latching 
stage referred to as the latch buffer 580, shown in Figure 18. The second 
topology is simply a pair of inverters and referred to as the inverter 
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buffer 500, shown in Figure 19. The two buffer types are used for different 
applications. For higher power, jitter critical paths, the latch buffer 580 
is used because of the circuit's power supply rejection qualities. This 
includes buffering the differential phases coming out of the receive PLL 
5 circuit 501, going into the Phase Rotator 506, and coming out of the Phase 
Rotator 506. The inverter buffers 600 are used primarily to buffer single 
ended clocks to logic level circuits, including core logic and sampling 
latches 510. 

10 Referring now to Figure 18, the latch buffer 580 operates with 

positive feedback through cross -coupling n-channel devices to provide a very 
fast transition. This is good for avoiding power supply noise because the 
transition timing is a function of the differential signal coming in. It 
avoids using just one of the single ended sides to determine when to 

15 transition (like an inverter stage would) and, therefore, avoids relying on 
the supply to be steady. One of the drawbacks of this circuit is the 
significant DC level of current usage that normal inverters do not have. 
Another drawback is the lack of a rail to rail output. In the embodiment 
shown, the p-channel devices are always on, therefore causing the down-level 

2 0 to only approach about 200mV. 

Referring now to Figure 19, the inverter buffer 600 relies on using 
pairs of inverter stages 602 to track mismatches in p- to n-channel devices. 
This greatly improves jitter performance through the inverter stages 602. 
25 Whenever the inverter buffer 600 is used to ramp up the driving capability 
of a circuit, the general rule of exponentially increasing inverter sizes by 
the power of u e" was used. This keeps rise and fall times consistent 
through all. -stages of . inverter chains. And since jitter is basically a 
linear function of rise and fall time, this prevented excessive jitter at 

3 0 any one stage. To maintain the duty-cycle of the clocks, the ratio of p- to 

n-channel was selected in the embodiment shown in Figure 19 to be 2.5 in 
order to match the approximate drive mismatch of the two devices in 7SF. It 
is preferred that the inverters are sized at a minimal length to maximize 
speed performance. 

35 

The phase buffers 508 characteristics are measured primarily by power 
usage and jitter. In most cases, it is preferential to trade off increased 
power usage for better jitter performance. Table 6 illustrates jitter and 
power numbers for exemplary embodiments of the Phase Buffers 472 and 508. 
40 The simulated jitter numbers were based on power supply noise. For the 
transmit Phase Buffers 472, the noise level was 75mVp-p. For the receive 
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Phase Buffers 508, the noise level was 150mVp-p. All numbers are for 2.5 
Gbps operation, on a per link basis. 



Table 6: XMT and RCV Phase Buffer Performance (at 2.5Gbps) 



5 



Test Conditions 


POWER 
SPEC 


POWER 
SIM 


JITTER 
SPEC 


JITTER 
SIM 


RCV PB, ASICBC, 1.98V VCC, 0C 


6.6mW 


13 -2mW 


8ps PP 


1.2ps PP 


RCV PB, TYP, 1.8V VCC, 62. 5C 


9 .2mW 


2.6ps PP 


RCV PB, ASICWC, 1.62V VCC, 125C 


6.3mW 


5.2ps PP 


XMT PB, ASICBC, 1.98V VCC, OC 


1.8mW 


6.8mW 


8ps PP 


6.8ps PP 


XMT PB, TYP, 1.8V VCC, 62 . 5C 


4.9mW 


14.4ps PP 


XMT PB, ASICWC, 1.62V VCC 125C 


3 .9mW 


18.5ps PP 



Referring now to Figure 22, a block diagram of the topography of an 
embodiment of the phase rotator circuits 506, associated cbias circuits 507 
and phase buffer circuits 508 are shown. The phase rotator 506 comprises 

10 phase rotator currents buffer circuits 610, phase rotator current circuits 
612 and phase rotator core circuits 614. The phase buffer circuits 508 
comprise phase buffer core circuits 618 and phase buffer post-buffer 
circuits 620. The phase rotator circuits 506, associated cbias circuits 507 
and phase buffer circuits 508 are more fully described in U.S. patent 

15 application, Serial No. 09/861, 668, filed May 22, 2001, by Schma.tz, entitled 
"Phase Rotator and Data Recovery Receiver Incorporating said Phase Rotator" , 
the entire disclosure of which is incorporated by reference herein. 
Schematic exemplary diagrams of elements of Figure 22 have been provided as 
follows. 

20 ...... ' ' 

Figure 23 provides, an exemplary schematic diagram of the phase rotator 
cbias circuit 507. 

Figure 24 provides an exemplary schematic diagram of the phase rotator 
25 currents buffer circuit 610. 

Figure 26 provides an exemplary schematic diagram of the phase rotator 
currents buffer circuit 612 . 

3 0 Figure 28 provides an exemplary schematic diagram of the phase rotator 

core circuit 614 . 

With respect to the phase buffer circuits 508, Figure 29 provides an 
exemplary schematic diagram of the phase rotator buffer core circuit 618, 
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and Figure 3 0 provides an exemplary schematic diagram of the phase rotator 
buffer post-buffer circuit 620. 

Block diagrams have also been provided to more clearly illustrate 
5 phase rotator 506 and phase buffer circuitry 508. Figure 25 is a block 
diagram of a phase rotator currents buffer 610 six pack 611. 

Figure 27 is a block diagram of a phase rotator core circuit 614 six 
pack 615. 

10 

Figure 31 shows another embodiment featuring a basic FIR filter 632 
approach with eight taps tl to t8 from an eight stage/phase ring oscillator 
63 0. Five different weighting factors mO to m4 are assumed to be available, 
and they are built by summing sub-factors wl to w4. Table 7 shows the 
15 initial configuration for the weighting factors. 

Table 7: Configuration of the weighting factors mO to m4 from sub- factors 21 



to w4 



Weighting factor 


Configuration 


mO 


- 0 {not used in initial configuration) 


ml 


= wl 


m2 


= wl + w2 


m3 


= wl + w2 + w3 


m4 


= wl + w2 + w3 + w4 



20 

Figure 32 shows the stepwise change of output phase by sequentially 
changing the weighting 1 factors that" determine the contribution from each 
phase tap to the actual output. In step (a), for example, the weighting 
factor at tap tl is changed from wl to wl+w2 and, at the same time, the 
25 weight at tap t8 is changed to zero. For a 'clever' setting of the weight 
values wl to w4, this will shift the output phase by exactly one-fourth of a 
phase slice. After the last rotating step (d) , all weights have been 
shifted by one tap position. This corresponds to a shift of one phase slice 
at the output of the FIR. 

30 

By repetition of the above sequence, any phase setting may be tuned 
in. Because this is a circular operation, the range of the output phase is 
not limited to the 0 to 360 degree interval. This allows a continuous 
variation of the phase and thereby a frequency adjustment. Due to the fact 
35 that the weighting factors are only changed by adding or subtracting one 
sub- factor element at a time, no glitches can occur. 
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A simplified schematic for a six-phase phase rotator 640 according to 
the present invention is provided in Figure 33. With six-phase slices, four 
possible weighting factors mO to m3 are built by variable summation of the 
5 three sub- factors wO to w2. A temperature code logic generates the control 
signals for the wired summation of currents. This allows the generation of 
eighteen phase steps for one 360 degree .rotation from a three stage 
differential ring oscillator. The output of the FIR blocks are preferably 
summed by a wired n- function. In order to generate high quality clock 
10 signals, it is preferred that differential clock buffers are used. 

Figure 34 provides a detail view of one of the phase rotator circuit 
blocks 642 of Figure 33. 

15 While preferred embodiments have been described herein, variations 

in the design may be made, and such variations may be apparent to those 
skilled in the art of making tools, as well as to those skilled in other 
arts. The performance and signal specifications identified above are by 
no means the only specifications suitable for the method and system of the 

20 present invention, and substitute specifications will be readily apparent 
to one skilled in the art. The scope of the invention, therefore, is only 
to be limited by the following claims. 



WO 02/058355 



PCT/GB02/00128 



36 
CIAIMS 

1. A method of transferring stored digital parallel data of multiple bits 
of data stored in a first data register from a transmitter to a receiver 

5 over a hard wired conductor comprising the steps of: 

synchronously converting said stored digital data to a serial analog 
data signal in said transmitter; 

transmitting said serial analog signal asynchronously over said hard 
wired conductor to said receiver; and 
10 restoring said asynchronous serial analog signal to synchronous 

digital parallel data in said receiver corresponding to the data stored in 
said first data register in said transmitter, including detecting both edges 
of the data in said asynchronous serial analog signal for conversion to 
parallel data bits. 

15 

2. A method according to claim 1 wherein the digital parallel data is 
read out of said first data register to at least one single bit latch. 

3. A method according to claim 1 or 2 wherein the data is read out from 

2 0 said first register in said transmitter two bits at a time, each data bit to 
first and second single data bit registers, and from each first and second 
single bit data register to a third single bit data register, clocking 
additional two data bits to be subsequently written to said first and second 
one bit registers and to said third single bit data register until all bits 

25 of the data have been read from the first register. 

4. A method according to claim 3 wherein the bits from the third single 
bit register are converted to a single analog serial signal of . the., data . . . 

30 5. A method according to any preceding claim wherein the data in said 
first register is comprised of either eight or ten bits. 

6. A method according to any preceding claim wherein a clocking signal is 
used to convert said analog serial signal to a digital signal. 

35 

7. A method according to claim 6 wherein said analog signal is converted 
in said receiver to two one-bit signals and delivered to a shift register 
and then stored in a second data register. 

40 8. A method according to claim 7 wherein said bits in the shift register 
are delivered synchronously from said shift register to said second data 
register. 
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9. A method according to any preceding claim wherein said edges are 
derived from multiple samples. 

10 . A method according to claim 9 wherein said multiple samples are used 
5 to determine the approximate center of said resulting data bit. 

11. A structure for transferring stored digital parallel data of multiple 
bits of data stored in a first data register, comprising a transmitter and a 
receiver connected by a hard wired conductor; 

10 circuitry to synchronously convert said stored digital data to a 

serial analog data signal in said transmitter; 

circuitry to transmit said serial analog signal asynchronously over 
said hard wired conductor to said receiver; and 

circuitry to restore said asynchronous serial analog signal to 
15 synchronous digital parallel data in said receiver corresponding to the data 
stored in said first data register in said transmitter, including detecting 
both edges of the data in said asynchronous serial analog signal for 
conversion to parallel data bits . 

20 12. A structure according to claim 11 including at least one single bit 

latch and circuitry to read the digital parallel data out of said first data 
register to said at least one single bit latch. 

13. A structure according to claim 11 or 12 including first, second and 

2 5 third single data bit registers, and wherein the data is read out from said 

first register in said transmitter two bits at a time, each data bit to 
either said first or second single data bit registers, and then from each 

first and second single bit data register to said third single bit data 

register, clocking to clock additional two data bits to be subsequently 

3 0 written to said first and second one bit registers and to said third single 

bit data register until all bits of the data have been read from the first 
register. 

14. A structure according to claim 13 including circuitry to convert the 
3 5 bits from the third single bit register into a single analog serial signal 

of the data. 

15. A structure according to any of claims 11 to 14 wherein the data in 
said first register is comprised of either eight or ten bits. 

40 

16. A structure according to any of claims 11 to 15 including a clocking 
signal to convert said analog serial signal to a digital signal. 
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17 . A structure according to any of claims 11 to 16 including a second 
data bit register and circuitry in said receiver to convert said analog 
signal, to two one-bit signals delivered to a shift register, and store the 
converted bits in said second data register. 

5 

18. A structure according to claim 17 wherein said bits in the shift 
register are delivered synchronously from said shift register to said second 
data register. 

10 19. A structure according to any of claims 11 to 18 including circuitry to 
derive said edges from multiple samples. 



15 



20. A structure according to claim 19 wherein said circuitry to derive 
said edges from said multiple samples determines the approximate center of 
said resulting data bit. 
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